home *** CD-ROM | disk | FTP | other *** search
- <?xml version="1.0" encoding="UTF-8" ?>
- <!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
- <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
- <!-- $Revision: 1.3.2.6 $ -->
-
- <!--
- Copyright 2002-2004 The Apache Software Foundation
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
-
- <manualpage metafile="ebcdic.xml.meta">
- <parentdocument href="./">Platform Specific Notes</parentdocument>
-
- <title>The Apache EBCDIC Port</title>
-
- <summary>
-
- <note type="warning"><strong>Warning:</strong> This document
- has not been updated to take into account changes made in
- the 2.0 version of the Apache HTTP Server. Some of the
- information may still be relevant, but please use it with care.
- </note>
-
- </summary>
-
- <section id="overview">
-
- <title>Overview of the Apache EBCDIC Port</title>
-
- <p>Version 1.3 of the Apache HTTP Server is the first version
- which includes a port to a (non-ASCII) mainframe machine which
- uses the EBCDIC character set as its native codeset.</p>
-
- <p>(It is the SIEMENS family of mainframes running the <a
- href="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD
- operating system</a>. This mainframe OS nowadays features a
- SVR4-derived POSIX subsystem).</p>
-
- <p>The port was started initially to</p>
-
- <ul>
- <li>prove the feasibility of porting <a
- href="http://dev.apache.org/">the Apache HTTP server</a> to
- this platform</li>
-
- <li>find a "worthy and capable" successor for the venerable
- <a href="http://www.w3.org/Daemon/">CERN-3.0</a> daemon
- (which was ported a couple of years ago), and to</li>
-
- <li>prove that Apache's preforking process model can on this
- platform easily outperform the accept-fork-serve model used
- by CERN by a factor of 5 or more.</li>
- </ul>
-
- <p>This document serves as a rationale to describe some of the
- design decisions of the port to this machine.</p>
-
- </section>
-
- <section id="design">
-
- <title>Design Goals</title>
-
- <p>One objective of the EBCDIC port was to maintain enough
- backwards compatibility with the (EBCDIC) CERN server to make
- the transition to the new server attractive and easy. This
- required the addition of a configurable method to define
- whether a HTML document was stored in ASCII (the only format
- accepted by the old server) or in EBCDIC (the native document
- format in the POSIX subsystem, and therefore the only realistic
- format in which the other POSIX tools like <code>grep</code> or
- <code>sed</code> could operate on the documents). The current
- solution to this is a "pseudo-MIME-format" which is intercepted
- and interpreted by the Apache server (see below). Future versions
- might solve the problem by defining an "ebcdic-handler" for all
- documents which must be converted.</p>
-
- </section>
-
- <section id="technical">
-
- <title>Technical Solution</title>
-
- <p>Since all Apache input and output is based upon the BUFF
- data type and its methods, the easiest solution was to add the
- conversion to the BUFF handling routines. The conversion must
- be settable at any time, so a BUFF flag was added which defines
- whether a BUFF object has currently enabled conversion or not.
- This flag is modified at several points in the HTTP
- protocol:</p>
-
- <ul>
- <li><strong>set</strong> before a request is received
- (because the request and the request header lines are always
- in ASCII format)</li>
-
- <li><strong>set/unset</strong> when the request body is
- received - depending on the content type of the request body
- (because the request body may contain ASCII text or a binary
- file)</li>
-
- <li><strong>set</strong> before a reply header is sent
- (because the response header lines are always in ASCII
- format)</li>
-
- <li><strong>set/unset</strong> when the response body is sent
- - depending on the content type of the response body (because
- the response body may contain text or a binary file)</li>
- </ul>
-
- </section>
-
- <section id="porting">
-
- <title>Porting Notes</title>
-
- <ol>
- <li>
- <p>The relevant changes in the source are <code>#ifdef</code>'ed
- into two categories:</p>
-
- <dl>
- <dt><code><strong>#ifdef
- CHARSET_EBCDIC</strong></code></dt>
-
- <dd>
- <p>Code which is needed for any EBCDIC based machine.
- This includes character translations, differences in
- contiguity of the two character sets, flags which
- indicate which part of the HTTP protocol has to be
- converted and which part doesn't <em>etc.</em></p>
- </dd>
-
- <dt><code><strong>#ifdef _OSD_POSIX</strong></code></dt>
-
- <dd>
- <p>Code which is needed for the SIEMENS BS2000/OSD
- mainframe platform only. This deals with include file
- differences and socket implementation topics which are
- only required on the BS2000/OSD platform.</p>
- </dd>
- </dl>
- </li>
-
- <li>
- <p>The possibility to translate between ASCII and EBCDIC at
- the socket level (on BS2000 POSIX, there is a socket option
- which supports this) was intentionally <em>not</em> chosen,
- because the byte stream at the HTTP protocol level consists
- of a mixture of protocol related strings and non-protocol
- related raw file data. HTTP protocol strings are always
- encoded in ASCII (the <code>GET</code> request, any Header: lines,
- the chunking information <em>etc.</em>) whereas the file transfer
- parts (<em>i.e.</em>, GIF images, CGI output <em>etc.</em>)
- should usually be just "passed through" by the server. This
- separation between "protocol string" and "raw data" is
- reflected in the server code by functions like <code>bgets()</code>
- or <code>rvputs()</code> for strings, and functions like
- <code>bwrite()</code> for binary data. A global translation
- of everything would therefore be inadequate.</p>
-
- <p>(In the case of text files of course, provisions must be
- made so that EBCDIC documents are always served in
- ASCII)</p>
- </li>
-
- <li>
- <p>This port therefore features a built-in protocol level
- conversion for the server-internal strings (which the
- compiler translated to EBCDIC strings) and thus for all
- server-generated documents. The hard coded ASCII escapes
- <code>\012</code> and <code>\015</code> which are ubiquitous
- in the server code are an exception: they are already the binary
- encoding of the ASCII <code>\n</code> and <code>\r</code> and
- must not be converted to ASCII a second time.
- This exception is only relevant for server-generated strings;
- and <em>external</em> EBCDIC documents are not expected to
- contain ASCII newline characters.</p>
- </li>
-
- <li>
- <p>By examining the call hierarchy for the BUFF management
- routines, I added an "ebcdic/ascii conversion layer" which
- would be crossed on every puts/write/get/gets, and a
- conversion flag which allowed enabling/disabling the
- conversions on-the-fly. Usually, a document crosses this
- layer twice from its origin source (a file or CGI output) to
- its destination (the requesting client): <code>file ->
- Apache</code>, and <code>Apache -> client</code>.</p>
-
- <p>The server can now read the header lines of a CGI-script
- output in EBCDIC format, and then find out that the remainder
- of the script's output is in ASCII (like in the case of the
- output of a WWW Counter program: the document body contains a
- GIF image). All header processing is done in the native
- EBCDIC format; the server then determines, based on the type
- of document being served, whether the document body (except
- for the chunking information, of course) is in ASCII already
- or must be converted from EBCDIC.</p>
- </li>
-
- <li>
- <p>For Text documents (MIME types text/plain, text/html
- <em>etc.</em>), an implicit translation to ASCII can be
- used, or (if the users prefer to store some documents in
- raw ASCII form for faster serving, or because the files
- reside on a NFS-mounted directory tree) can be served
- without conversion.</p>
-
- <p><strong>Example:</strong></p>
-
- <p>to serve files with the suffix <code>.ahtml</code> as a
- raw ASCII <code>text/html</code> document without implicit
- conversion (and suffix <code>.ascii</code> as ASCII
- <code>text/plain</code>), use the directives:</p>
-
- <example>
- AddType text/x-ascii-html .ahtml <br />
- AddType text/x-ascii-plain .ascii
- </example>
-
- <p>Similarly, any <code>text/foo</code> MIME type can be
- served as "raw ASCII" by configuring a MIME type
- "<code>text/x-ascii-foo</code>" for it using
- <code>AddType</code>.</p>
- </li>
-
- <li>
- <p>Non-text documents are always served "binary" without
- conversion. This seems to be the most sensible choice for,
- .<em>e.g.</em>, GIF/ZIP/AU file types. This of course
- requires the user to copy them to the mainframe host using
- the "<code>rcp -b</code>" binary switch.</p>
- </li>
-
- <li>
- <p>Server parsed files are always assumed to be in native
- (<em>i.e.</em>, EBCDIC) format as used on the machine, and
- are converted after processing.</p>
- </li>
-
- <li>
- <p>For CGI output, the CGI script determines whether a
- conversion is needed or not: by setting the appropriate
- Content-Type, text files can be converted, or GIF output can
- be passed through unmodified. An example for the latter case
- is the wwwcount program which we ported as well.</p>
- </li>
-
- </ol>
-
- </section>
-
- <section id="document">
-
- <title>Document Storage Notes</title>
-
- <section id="binary">
-
- <title>Binary Files</title>
-
- <p>All files with a <code>Content-Type:</code> which does not
- start with <code>text/</code> are regarded as <em>binary
- files</em> by the server and are not subject to any conversion.
- Examples for binary files are GIF images, gzip-compressed files
- and the like.</p>
-
- <p>When exchanging binary files between the mainframe host and
- a Unix machine or Windows PC, be sure to use the ftp "binary"
- (<code>TYPE I</code>) command, or use the
- <code>rcp -b</code> command from the mainframe host (the
- <code>-b</code> switch is not supported in unix
- <code>rcp</code>'s).</p>
-
- </section>
-
- <section id="text">
-
- <title>Text Documents</title>
-
- <p>The default assumption of the server is that Text Files
- (<em>i.e.</em>, all files whose <code>Content-Type:</code>
- starts with <code>text/</code>) are stored in the native
- character set of the host, EBCDIC.</p>
-
- </section>
-
- <section id="ssi">
-
- <title>Server Side Included Documents</title>
-
- <p>SSI documents must currently be stored in EBCDIC only.
- No provision is made to convert it from ASCII before
- processing.</p>
-
- </section>
-
- </section>
-
- <section id="modules">
-
- <title>Apache Modules' Status</title>
-
- <table border="1">
- <tr>
- <th>Module</th>
- <th>Status</th>
- <th>Notes</th>
- </tr>
-
- <tr>
- <td><module>core</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_access</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_actions</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_alias</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_asis</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_auth</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_auth_anon</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_auth_dbm</module></td>
- <td class="centered">?</td>
- <td>with own <code>libdb.a</code></td>
- </tr>
-
- <tr>
- <td><module>mod_autoindex</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_cern_meta</module></td>
- <td class="centered">?</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_cgi</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><code>mod_digest</code></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_dir</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_so</module></td>
- <td class="centered">-</td>
- <td>no shared libs</td>
- </tr>
-
- <tr>
- <td><module>mod_env</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_example</module></td>
- <td class="centered">-</td>
- <td>(test bed only)</td>
- </tr>
-
- <tr>
- <td><module>mod_expires</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_headers</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_imap</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_include</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_info</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><code>mod_log_agent</code></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_log_config</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><code>mod_log_referer</code></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_mime</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_mime_magic</module></td>
- <td class="centered">?</td>
- <td>not ported yet</td>
- </tr>
-
- <tr>
- <td><module>mod_negotiation</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_proxy</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_rewrite</module></td>
- <td class="centered">+</td>
- <td>untested</td>
- </tr>
-
- <tr>
- <td><module>mod_setenvif</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_speling</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_status</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_unique_id</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_userdir</module></td>
- <td class="centered">+</td>
- <td></td>
- </tr>
-
- <tr>
- <td><module>mod_usertrack</module></td>
- <td class="centered">?</td>
- <td>untested</td>
- </tr>
- </table>
-
- </section>
-
- <section id="third-party">
-
- <title>Third Party Modules' Status</title>
-
- <table border="1">
- <tr>
- <th>Module</th>
- <th>Status</th>
- <th>Notes</th>
- </tr>
-
- <tr>
- <td><code><a href="http://java.apache.org/">mod_jserv</a>
- </code></td>
- <td class="centered">-</td>
- <td>JAVA still being ported.</td>
- </tr>
-
- <tr>
- <td><code><a href="http://www.php.net/">mod_php3</a></code></td>
- <td class="centered">+</td>
- <td><code>mod_php3</code> runs fine, with LDAP and GD
- and FreeType libraries.</td>
- </tr>
-
- <tr>
- <td><code><a
- href="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html"
- >mod_put</a></code></td>
- <td class="centered">?</td>
- <td>untested</td>
- </tr>
-
- <tr>
- <td><code><a href="ftp://hachiman.vidya.com/pub/apache/"
- >mod_session</a></code></td>
- <td class="centered">-</td>
- <td>untested</td>
- </tr>
- </table>
-
- </section>
-
- </manualpage>
-